[python] Refactor FullStartingScanner by introducing PartialStartingScanner class by discivigour · Pull Request #7032 · apache/paimon

discivigour · 2026-01-13T12:48:04Z

Purpose

The current FullStartingScanner class is a bit bloated. This pr introduces PartialStartingScanner to split some of the features of FullStartingScanner class.

Tests

API and Format

Documentation

…canner

XiaoHongbo-Hope · 2026-01-14T03:56:03Z

paimon-python/pypaimon/read/scanner/full_starting_scanner.py

+        if idx_of_this_subtask >= number_of_para_subtasks:
+            raise Exception("idx_of_this_subtask must be less than number_of_para_subtasks")
+        if self.start_pos_of_this_subtask is not None:
+            raise Exception("with_shard and with_slice cannot be used simultaneously")


use a suitable Exception or Value type instead of generic Exception, such as ValueError.

XiaoHongbo-Hope · 2026-01-14T04:00:11Z

paimon-python/pypaimon/read/scanner/full_starting_scanner.py

+                split.shard_file_idx_map[file.file_name] = (0, plan_end_pos - file_begin_pos)
+            elif file_end_pos <= plan_start_pos or file_begin_pos >= plan_end_pos:
+                split.shard_file_idx_map[file.file_name] = (-1, -1)
+        return file_end_pos


define a constant for (-1, -1), which is important for other developers to understand it.

XiaoHongbo-Hope · 2026-01-14T04:03:09Z

paimon-python/pypaimon/read/scanner/full_starting_scanner.py

+        return filtered_partitioned_files, (plan_start_pos, plan_end_pos)
+
+    def _append_only_filter_by_shard(self, partitioned_files: defaultdict) -> (defaultdict, int, int):
+        """


similar code with _data_evolution_filter_by_shard, can you refactor ?

XiaoHongbo-Hope · 2026-01-14T04:21:30Z

paimon-python/pypaimon/read/scanner/full_starting_scanner.py

-            if is_blob and not self._is_blob_file(file.file_name):
+        if self._partial_read():
+            partitioned_files = self._filter_by_pos(partitioned_files)
+


do we need sort before call _filter_by_pos?

XiaoHongbo-Hope · 2026-01-14T04:24:52Z

paimon-python/pypaimon/read/table_scan.py

+        self.partial_read = True
+        self.starting_scanner = self._create_starting_scanner()
        self.starting_scanner.with_shard(idx_of_this_subtask, number_of_para_subtasks)
        return self


If INCREMENTAL_BETWEEN_TIMESTAMP is configured, _create_starting_scanner() returns IncrementalStartingScanner which doesn't have with_shard(). This will cause AttributeError.

XiaoHongbo-Hope · 2026-01-14T05:47:33Z

paimon-python/pypaimon/read/scanner/full_starting_scanner.py

+
+    def _filter_by_pos(self, files):
+        if self.table.is_primary_key_table:
+            return self._primary_key_filter_by_shard(files)


This may cause exception if call with_slice for pk table

umi added 2 commits January 13, 2026 20:43

[python] Refactor FullStartingScanner by introducing PartialStartingS…

511aaa6

…canner

fix

7d587e6

discivigour marked this pull request as ready for review January 13, 2026 12:56

fix

81294e3

XiaoHongbo-Hope reviewed Jan 14, 2026

View reviewed changes

discivigour marked this pull request as draft January 14, 2026 03:58

XiaoHongbo-Hope reviewed Jan 14, 2026

View reviewed changes

discivigour closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Refactor FullStartingScanner by introducing PartialStartingScanner class#7032

[python] Refactor FullStartingScanner by introducing PartialStartingScanner class#7032
discivigour wants to merge 3 commits intoapache:masterfrom
discivigour:refactorScanner

discivigour commented Jan 13, 2026 •

edited

Loading

Uh oh!

XiaoHongbo-Hope Jan 14, 2026 •

edited

Loading

Uh oh!

discivigour Jan 14, 2026

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

discivigour commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

XiaoHongbo-Hope Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

discivigour Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

discivigour commented Jan 13, 2026 •

edited

Loading

XiaoHongbo-Hope Jan 14, 2026 •

edited

Loading